Method and apparatus for monitoring and updating system software

ABSTRACT

A probe determines a value for a metric for a site. A message is generated, including the value for the metric, and is delivered to a monitoring apparatus. The monitoring apparatus determines if the value is acceptable, based on the metric and possibly the site that the probe is monitoring. If the value is not acceptable, then the monitoring apparatus displays an alert to a possible problem.

FIELD OF THE INVENTION

This invention pertains to monitoring software for a variety ofconditions such as internal performance characteristics, liabilitywarnings, programmatic errors and the general health of the computersystem.

BACKGROUND OF THE INVENTION

No matter the computer program, it is inevitable that there will be somebugs (that is, coding errors that cause the program to behavedifferently what is expected). Production environments represent anumber of variables that are difficult to reproduce in testingenvironments. As such, applications with thousands of interfaces canfail under a variety of changing variables.

Because human intervention is required to maintain these applications,certain tasks must be completed by operations on a timely basis. Failureto operate and maintain the system within the published guidelines forthe application will result in a number of unacceptable issues. Theseinclude, but are not limited to the following: inaccurate reporting ofrevenues; increased risks associated with liability; increased riskswith system availability; and increased costs due to additional manpowercorrection activities.

Customers want to know that their mission critical system is performingat peak levels of performance. They want to know when an area of thesystem is failing. They need to feel confident that the system and itsintegration with operations are running smoothly. Not knowing the healthof the internal components of the system can create a false sense ofsecurity.

Another thing software companies sometimes do to eliminate defects is tofind out about defects from customers. For a long time, customers had tomake contact with the software companies (either by telephone or bye-mail) and let the software companies know about the bugs. Morerecently, as exemplified by Microsoft® Windows® XP, the operating systemoffers to send an error report to the software company when a programcrashes. That way, the software company is informed about seriouserrors. (Microsoft and Windows are registered trademarks of MicrosoftCorporation in the United States and other countries.)

Some third party products that monitor the operation of systems from theoutside exist. For example, Netcool, by Micromuse, collects informationfrom APIs, log files, and other utilities, and forwards this informationto a server for filtering. Patrol, by BMC Software, offers remotemonitoring and full-application management. But both of these productsare external to the applications being monitored. These products focusprimarily on external environments surrounding the application. Theycannot detect the internal health of the application itself and thustheir reporting value is limited in scope.

A need remains for a way to proactively detect application problems andsoftware defects through monitoring internal application performancebeyond that associated with the prior art.

SUMMARY OF THE INVENTION

The invention is an apparatus, system, and method for monitoringcomputers. A series of probes residing on a customer's computerdetermines values for metrics and sends these values to a monitoringapparatus. The monitoring apparatus determines if the values for themetrics are acceptable. If the values for the metrics are notacceptable, then an alert is displayed so that a corrective measures canbe initiated.

The foregoing and other features, objects, and advantages of theinvention will become more readily apparent from the following detaileddescription, which proceeds with reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a central server with a monitoring apparatus, communicatingwith probes on customer computers, according to an embodiment of theinvention.

FIG. 2 shows details of the monitoring apparatus of FIG. 1.

FIG. 3 shows details of the probes of FIG. 1.

FIG. 4 show the probes of FIG. 1 communicating with the monitoringapparatus of FIG. 1.

FIG. 5 shows a flowchart of the procedure for using the probes of FIG.1.

FIGS. 6A-6C show a flowchart of the procedure for using the monitoringapparatus of FIG. 1.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

FIG. 1 shows a central server with a monitoring apparatus, communicatingwith probes on customer computers, according to an embodiment of theinvention. In FIG. 1, server 105 is a central server. Server 105 isoperated by the company distributing software products to its customers.Customers operate, for example, computers 110, 115, and 120. Although aperson skilled in the art will recognize that there can be more or fewerthan three customers, and that each customer can have more than onecomputer.

Installed on computers 110, 115, and 120 are probes 125, 130, and 135.Probes 125, 130, and 135 are responsible for determining the valuesassociated with various metrics on computers 110, 115, and 120respectively, and transmitting these values back to server 105. Thedetails of probes 125, 130, and 135 are discussed further with referenceto FIGS. 3-4 below.

Server 105 includes monitoring apparatus 140. Monitoring apparatus 140receives information from probes 125, 130, and 135, and determineswhether the data received from the probes represent acceptable values.If the values are acceptable, then monitoring apparatus 140 logs thevalues. Otherwise, monitoring apparatus displays 140 an alert,indicating the unacceptable value. The details of monitoring apparatus140 are discussed further with reference to FIG. 2 below.

Connecting server 105 with computers 110, 115, and 120 is network 145.Network 145 can be any variety of network including, among others, alocal area network (LAN), a wide area network (WAN), a global network(such as the Internet), and a wireless network (for example, usingBluetooth or any of the IEEE 802.11 standards). In addition, a personskilled in the art will recognize that different networks can be used toconnect server 105 with different computers. For example, server 105might be connected to computer 110 using one network, and to computers115 and 120 using a second network.

FIG. 2 shows details of the monitoring apparatus of FIG. 1. Monitoringapparatus 140 includes four components: receiver 205, tester 210,alerter 215, and log 220. Receiver 205 is responsible for receiving amessage from a probe and parsing the message for the necessaryinformation. Tester 210 then tests the value (or values) retrieved fromthe message received by receiver 205 to determine if the value isacceptable. Alerter 215 displays an alert is the value retrieved fromthe message is not acceptable. And log 220 includes entries, like entry225, which reflect the received message, its values, and/or whether thevalue is acceptable.

To determine whether a value is acceptable, monitoring apparatus 140uses database 230. Database 230 includes filters, such as filters 235,240, and 245, which identify what values are considered acceptable.Different filters exist for different metrics. For example, filter 235is a filter for the number of transactions occurring at a givenlocation, whereas filter 245 is a filter for the number of open daysexperienced at a location.

Some filters, such as filter 245, can be used for all casino locations.But other metrics, such as the number of transactions, can vary from onelocation to another. To account for differing interpretations ofacceptable values, different filters can be set up for a single metric,each filter identifying acceptable values for a different casinolocation. Thus, while filters 235 and 240 both represent acceptablevalues for the transactions metric, they represent acceptable values fordifferent casinos.

Although a different filter can be set up for each different site for agiven metric, the amount of variation in acceptable values might belimited. Where two or more sites agree on what constitutes an acceptablevalue for a given metric, there is no need for each site to have aseparate filter. Thus, while FIG. 2 shows filters 235 and 240 being usedfor individual sites, a person skilled in the art will recognize that asingle filter can be used for some (but not necessarily all) sites.

To select the appropriate filter, monitoring apparatus 140 uses selector250. Selector 250 uses information from the message to select theappropriate metric. Selector 250 determines the metric represented inthe message and, if necessary, the site from which the metric wasmeasured. Selector 250 then uses these pieces of information to find theappropriate filter in database 230, so that tester 210 can determine ifthe value is acceptable.

FIG. 3 shows details of the probes of FIG. 1. In FIG. 3, probe 125includes sensors 305, 310, and 315. Each sensor operates to determinevalues for different metrics for computer 110. For example, sensor 305determines the number of transactions that occur in a given day indatabase 320 on computer 110, sensor 310 determines the number of opendays at the site, and sensor 315 determines the number of fixes appliedto software 325 on computer 110. A person skilled in the art willrecognize that although three sensors are shown within probe 125, therecan be fewer or more sensors in a given probe. In addition, there can bemore than one probe for a given computer, each with the same ordiffering numbers of sensors.

Because sensor measurements are taken more than once, each of sensors305, 310, and 315 includes a corresponding timer 330, 335, and 340. Thetimers ensure that the sensors take measurements according to regularschedules. Each timer can be set to measure a metric using differentintervals. But a person skilled in the art will recognize that, forsensors measuring metrics according to consistent schedules, a singletimer can be used for more than one sensor.

Additionally, sensors can trigger on two different mechanisms. They canbe triggered on a timer or they can be triggered by an impromptu event.The latter is utilized to signal immediate attention to a critical eventthat has just taken place.

Finally, probe 125 includes message generator 345. Message generator 345takes the measurements from the various sensors 305, 310, and 315, andassembles a message from the measurements. The message is then sent tothe central server (not shown in FIG. 3). Message generator 345 cangenerate a single message for several metric measurements, or messagegenerator 345 can generate a separate message for each metricmeasurement.

FIG. 4 show the probes of FIG. 1 communicating with the monitoringapparatus of FIG. 1. In FIG. 4, message generator 345 is showngenerating message 405. Message 405 is shown in greater detail inblow-up 410. The message is dated Aug. 7, 2003, and is from site 1(which includes computer 110). Blow-up 410 shows two metricmeasurements. The site has measured 500,000 transactions, and has fiveopen days. There can also be other metrics included in the message.

Once message 405 is generated, it is delivered to e-mail server 415.E-mail server is responsible for starting message 405 along its journeyto receiver 205 in the central server. Although shown as a component ofcomputer 110, a person skilled in the art will recognize that e-mailserver 415 can be part of a separate computer, distinct from computer110, or can be a dedicated e-mail server. A typical implementation wouldmost likely utilize the customer's existing e-mail implementation. Thiswill provide a number of benefits including a cost savings through theelimination of a second server along with cost avoidance of supportingand maintaining the additional hardware.

FIG. 5 shows a flowchart of the procedure for using the probes ofFIG. 1. At step 505, the probe accesses a value for a metric. The valuecan be accessed from a database or from software. As shown by arrow 510,step 505 can be repeated as often as necessary, to access values formultiple metrics. At step 515, a message is generated. At step 520, themessage includes the value for the metric accessed in step 505. At step525, the site is included in the message, so that the central serverknows from where the message originated. At step 530, the message isdelivered to the e-mail server, and at step 535, the message is sent tothe monitoring apparatus by the e-mail server.

FIGS. 6A-6C show a flowchart of the procedure for using the monitoringapparatus of FIG. 1. In FIG. 6A, at step 605, the monitoring apparatusreceives a message from a probe. At step 610, the metric is determinedfrom the message. At step 615, a value for the metric is determined. Atstep 620, a site for the probe is determined.

At step 625 (FIG. 6B), the monitoring apparatus determines if the metricis site-specific. If the metric is site specific, then at step 630 themonitoring apparatus determines an acceptable value or range of valuesfor the metric/site combination. Otherwise, at step 635, the monitoringapparatus determines an acceptable value or range of values for themetric, without regard to the site of the probe. Either way, at step640, the system compares the value from the message with the acceptablevalue/range.

At step 645 (FIG. 6C), the monitoring apparatus determines if the valuefor the metric is acceptable. If the value is acceptable, then at step650 the monitoring apparatus logs the value for the metric and the sitefrom which the value was received. Otherwise, at step 655, themonitoring apparatus displays an alert, letting someone know about apotential problem.

As shown in FIG. 6A, certain steps can be omitted or repeated. Forexample, since a single message can include values for multiple metrics,steps 610 and 615 can be repeated. Also, if the metrics are notsite-specific, step 620 can be omitted (although typically the site istransmitted as part of the message, even if the metric is notsite-specific). Finally, as shown on FIG. 6C, if the value for themetric is acceptable, the value does not need to be logged, although,again, typically the value is logged.

A person skilled in the art will recognize that an embodiment of theinvention described above can be implemented using a computer. In thatcase, the method is embodied as instructions that make up a program. Theprogram may be stored on computer-readable media, such as floppy disks,optical discs (such as compact discs), or fixed disks (such as harddrives), and can be resident in memory, such as random access memory(RAM), read-only memory (ROM), firmware, or flash RAM memory. Theprogram as software can then be executed on a computer to implement themethod. The program, or portions of its execution, can be distributedover multiple computers in a network.

Having illustrated and described the principles of the invention in apreferred embodiment thereof, it should be readily apparent to thoseskilled in the art that the invention can be modified in arrangement anddetail without departing from such principles. All modifications comingwithin the spirit and scope of the accompanying claims are claimed.

1. A probe apparatus comprising: a first sensor to capture a first valuefor a first metric on a computer; and a message generator operative tosend a first message to a central site, the message including the firstvalue.
 2. A probe apparatus according to claim 1, further comprising asecond sensor to capture a second value for a second metric on thecomputer.
 3. A probe apparatus according to claim 2, wherein the messagegenerator is operative to send a second message to the central site, thesecond message including the second value.
 4. A probe apparatusaccording to claim 2, wherein the message generator is operative toinclude the second value in the first message.
 5. A probe apparatusaccording to claim 1, further comprising a timer, the first sensoroperative to capture the first value for the first metric when the timerends and to reset the timer.
 6. A probe apparatus according to claim 1,wherein: the computer includes a software package; and the probemonitors the software package.
 7. A probe apparatus according to claim1, wherein: the computer includes a database; and the probe monitors thedatabase.
 8. A monitoring apparatus, the system comprising: a messagereceiver to receive a first message from a first site, the first messageincluding a first value for a first metric; a tester to determine if thefirst value is acceptable; and an alerter to alert someone if the firstvalue is not acceptable.
 9. A monitoring apparatus according to claim 8,wherein: the tester includes a first filter, the first filter defining arange of acceptable values for the first metric; and the tester isoperative to compare the first value with the range of acceptable valuesfor the first filter.
 10. A monitoring apparatus according to claim 9,wherein the tester includes: a plurality of filters, each filterdetermining a range of acceptable values for a metric; and a selector toselect the first filter from the plurality of filters based on the firstmetric in the first message.
 11. A monitoring apparatus according toclaim 10, wherein: the plurality of filters includes at least one filterdefining a range of acceptable values for the first metric associatedwith a site; and a selector to select the first filter from theplurality of filters based on a first site in the first message.
 12. Amonitoring apparatus according to claim 8, further comprising a log, thelog including an entry corresponding to the first message.
 13. A systemfor monitoring software, comprising: a central computer; a monitoringapparatus installed in the central computer; a first computer; a firstprobe installed in the first computer; and a network connecting thecentral computer and the first computer.
 14. A system according to claim13, where: the system further comprises: a second computer; and a secondprobe installed in the second computer; and the network connects thecentral computer and the second computer.
 15. A system according toclaim 13, wherein: the first computer includes a software package; andthe first probe monitors the software package.
 16. A system according toclaim 13, wherein: the first computer includes a database; and the firstprobe monitors the database.
 17. A system according to claim 13,wherein: the monitoring apparatus includes: a message receiver toreceive a first message from a first site, the first message including afirst value for a first metric; a tester to determine if the first valueis acceptable; and an alerter to alert someone if the first value is notacceptable; and the probe includes: a first sensor to capture a firstvalue for a first metric; and a message generator operative to send afirst message to a central site, the message including the first value.18. A system according to claim 13, wherein the first computer includesan e-mail server to generate a message from the first probe to themonitoring apparatus.
 19. A method for using a probe, comprising:accessing a first value for a first metric by the probe; generating amessage by the probe, the message including the first value for thefirst metric; and sending the message to a monitoring apparatus by theprobe.
 20. A method according to claim 19, wherein sending the messageincludes: delivering the message to an e-mail server by the probe;delivering the message to the monitoring apparatus by the e-mail server.21. A method according to claim 19, wherein accessing the first valueincludes accessing a software package by the probe.
 22. A methodaccording to claim 19, wherein accessing the first value includesaccessing a database by the probe.
 23. A method according to claim 19,wherein generating a message further includes generating the message bythe probe, the message including the first value for the first metricand an identifier for a site of the probe.
 24. A method for using amonitoring apparatus, comprising: receiving a message; determining afirst value for a first metric from the message; determining if thefirst value for the first metric is acceptable; and if the first valuefor the first metric is not acceptable, displaying an alert that thefirst value for the first metric is not acceptable.
 25. A methodaccording to claim 24, further comprising, if the first value for thefirst metric is acceptable, logging the first value for the firstmetric.
 26. A method according to claim 24, wherein: determining a firstvalue includes determining the first value for the first metric for afirst site from the message; and determining if the first value for thefirst metric is acceptable includes determining if the first value forthe first metric for the first site is acceptable.
 27. A methodaccording to claim 24, wherein determining if the first value for thefirst metric is acceptable includes comparing the first value for thefirst metric with at least one acceptable value.
 28. A method accordingto claim 24, wherein determining if the first value for the first metricis acceptable includes determining if the first value for the firstmetric is within a range of acceptable values.
 29. A method according toclaim 24, wherein receiving a message includes: accessing the firstvalue for the first metric by a probe; generating the message by theprobe; and sending the message to the monitoring apparatus by the probe.30. A method according to claim 29, wherein sending the messageincludes: delivering the message to an e-mail server by the probe;delivering the message to the monitoring apparatus by the e-mail server.31. A method according to claim 29, wherein accessing the first valueincludes accessing a software package by the probe.
 32. A methodaccording to claim 29, wherein accessing the first value includesaccessing a database by the probe.
 33. A method according to claim 29,wherein generating a message further includes generating the message bythe probe, the message including the first value for the first metricand an identifier for a site of the probe.
 34. Computer-readable mediacontaining a program to use a probe, the program comprising: software toaccess a first value for a first metric by the probe; software togenerate a message by the probe, the message including the first valuefor the first metric; and software to send the message to a monitoringapparatus by the probe.
 35. Computer-readable media according to claim34, wherein the software to send the message includes: software todeliver the message to an e-mail server by the probe; software todeliver the message to the monitoring apparatus by the e-mail server.36. Computer-readable media according to claim 34, wherein the softwareto access the first value includes software to access a software packageby the probe.
 37. Computer-readable media according to claim 34, whereinthe software to access the first value includes software to access adatabase by the probe.
 38. Computer-readable media according to claim34, wherein the software to generate a message further includes softwareto generate the message by the probe, the message including the firstvalue for the first metric and an identifier for a site of the probe.39. Computer-readable media containing a program to use a monitoringapparatus, the program comprising: software to receive a message;software to determine a first value for a first metric from the message;software to determine if the first value for the first metric isacceptable; and if the first value for the first metric is notacceptable, software to display an alert that the first value for thefirst metric is not acceptable.
 40. Computer-readable media according toclaim 39, further comprising, if the first value for the first metric isacceptable, software to log the first value for the first metric. 41.Computer-readable media according to claim 39, wherein: the software todetermine a first value includes software to determine the first valuefor the first metric for a first site from the message; and the softwareto determine if the first value for the first metric is acceptableincludes software to determine if the first value for the first metricfor the first site is acceptable.
 42. Computer-readable media accordingto claim 39, wherein the software to determine if the first value forthe first metric is acceptable includes software to compare the firstvalue for the first metric with at least one acceptable value. 43.Computer-readable media according to claim 39, wherein the software todetermine if the first value for the first metric is acceptable includessoftware to determine if the first value for the first metric is withina range of acceptable values.
 44. Computer-readable media according toclaim 39, wherein the software to receive a message includes: softwareto access the first value for the first metric by a probe; software togenerate the message by the probe; and software to send the message tothe monitoring apparatus by the probe.
 45. Computer-readable mediaaccording to claim 44, wherein the software to send the messageincludes: software to deliver the message to an e-mail server by theprobe; software to deliver the message to the monitoring apparatus bythe e-mail server.
 46. Computer-readable media according to claim 44,wherein the software to access the first value includes software toaccess a software package by the probe.
 47. Computer-readable mediaaccording to claim 44, wherein the software to access the first valueincludes software to access a database by the probe. 48.Computer-readable media according to claim 44, wherein the software togenerate a message further includes software to generate the message bythe probe, the message including the first value for the first metricand an identifier for a site of the probe.